Transverse Vector Search Experiment Memo 2023-09-20

We conducted an experiment to increase the number of data to be searched for in Extending the Red Link with AI, which we have been running publicly on this Scrapbox.

Searches include other people's Scrapbox data, books and papers

relevance

Cross-sectional Vector Search

Talk about putting books in Scrapbox.

Individual and aggregated loops

Since there might be some problems with publishing the generated results as they are without review, the output destination is a private project that is not open to the public.

Subsequent updates changed the search hits to output directly to Scrapbox, making it clearly unpublishable.

https://gyazo.com/bb1a0bd6cb151fa1b6ffa2d7c498744c

The page is automatically generated by AI at the destination of the red link created like this

mounting

Load from local pickle

It takes about 5 seconds to read 100,000 records

About 7 seconds to perform a local vector search

It's slow when you think of it as a web app response, but with the recent "throw a keyword or page that comes to mind, work on something else, and look at it after a while" style, it's not a problem.

There's about 10-25 minutes between when you throw the query and when I come back to check the results (I don't measure it).

I've been updating every 10 minutes to begin with, and I haven't seen a problem.

impressions

Do we really want to do it with "all data" or "all except this project /nishio"?

Generated using the keyword Combining knowledge is the source of new ideas, which in turn generates further knowledge and new combinations, which I already did in this project in my first experiment, I excluded it because it was not interesting because it hit almost all the pages of my project.

titles: ["🤖🔁 successful intelligence", "🤖🔁 successful intelligence", "🤖🔁 twist", "Hatena2015-01-07", "🌀Collaboration with AI", "Increase personal productivity first", "🌀successful intelligence", "tkgshn/ Karabiner not working on macOS Monterey"]

Maybe the last one is random.

It may be possible to find unexpected connections by not excluding my own writing, but since I am the one who makes the query, it may be true that it will be more similar to what I have written.

I wonder if people find value in "Letting my data-derived AI live in my wiki" because it gives them a "different perspective".

Not sure why I see the value of different perspectives in my own search from my own Scrapbox.

Is it really A?" in response to "It is A". or maybe the same individual has a sufficiently varied point of view due to the accumulation of more than 10 years of experience including Hatena Diary, or maybe I am highly sensitive to read differences by thinking "[Similarity -> What is the difference?

If you exclude yourself, it looks like this (long story, so I cut out some unnecessary details)

titles: ["motoso/discussion"], "tkgshn/knowledge and wisdom"], "nishio-books/MOT knowledge creation management and innovation Ikujiro Nonaka"], "tkgshn/knowledge tightly coupled with business"], "mtane0412/suggestion through ignorance"], "nishio- books/idea generation method and cooperative work support Jun Munemori", "tkgshn/knowledge", "nishio-llm2023/Study of methods and systems to support knowledge creation process", "blu3mo_filtered/dualism of knowledge and ability", "motoso/wise company", "blu 3mo_filtered/accounting systems", "tkgshn/vaccine against false information", "motoso/Think different"]

It's interesting that the sources are pretty disparate.

But this variance makes the current prompt less interesting to say.

Notebooks and ... fragments are related in that the combination and exchange of knowledge generates new knowledge and ideas.

I'm mentioning three of them, not all of them, so it would be more interesting if you talked about how they relate to each other individually.

I've already tried an improved prompt in nues implementation of this, so I'll be importing it back in the future.

Later, I tried other styles, such as having multiple lines of text commented on instead of a single line of title, but the search results themselves are more interesting than the generated text

It's not that the generated sentences are not interesting, it's simply interesting to see the results of the vector search

If the interestingness of the generated results when done with my project's data is 100, the interestingness in this experiment is 90, and the vector search results are about 300

I'm writing a summary of today's experiment first, holding back the urge to write about each of them individually.

I think the individual stories are overly detailed.

Well, if you can just imagine the book part as "a system where books pop up and related pages open when you talk about them in front of a bookshelf," you can understand how interesting it would be.

In the case of Scrapbox, the other person is still alive (and books can be alive, too, of course), so there is a possibility that the connection found by the AI here could lead to more conversation with that person afterwards.

So, the vector search results that were only output to the console are now written to Scrapbox as well.

This naturally led to parts of books and such being written in Scrapbox, making it impossible to publish directly.

Well, if we were originally going to do something like this, it was within the realm of possibility that we wouldn't be able to do it in a public forum, so that was the default.

Eventually, you won't be able to see it.

The question of whether to include this part of the search results in the AI's search is a difficult one.

Effect of increasing number of fragments related to interests

The effect of interferences between fragments of search results for the same keywords that happen to be next to each other, creating new ones.

Adverse effects of miscellaneous machine-generated data returning to input

Include them in the list once, and make a mechanism to mechanically remove them all together when you think it's a bad idea.

2023-10-05 I think it's not good, the mismatch between the title and the content is confusing

summary

Just a casual post on a private project with a smartphone or something, and you can now see the results of vector searches across multiple people's Scrapbox projects, books and papers.

We will improve it as we use it in the future.

It would be interesting to take not only the explicitly triggered ones, but also the ones taken from the recent updates of your project once a day or so, and generate them on your own.

---

This page is auto-translated from /nishio/横断ベクトル検索実験メモ2023-09-20 using DeepL. If you looks something interesting but the auto-translated English is not good enough to understand it, feel free to let me know at @nishio_en. I'm very happy to spread my thought to non-Japanese readers.